Skip to main content

Events

Web-scraping with Python and Introduction to text data with Python

in partnership with NCRM

This course provides the foundations for you to understand, execute and communicate text data analysis in a widely recognised software platform that was built for data analysis. Specifically, it will introduce additional skills using the Python programming language and requires prior introductory experience with Python. 


Event details

This practical-based face to face session will be delivered over two days and will provide you with both the technical programming skills and understanding of data science techniques that you will need to research pre-existing and novel social-political and economic issues and the kind of transferable skills that are currently in demand in the job market.

Text data surrounds us in our lives and comes in different shapes and sizes, e.g. newspaper articles, tweets, product reviews, song lyrics, etc. While it might seem at first glance that this information can hardly be summarized and compared, certain computational techniques allow extracting meaningful information from text data. This course provides the foundations for you to understand, execute and communicate text data analysis in a widely recognised software platform that was built for data analysis

Specifically, it will introduce additional skills using the Python programming language and requires prior introductory experience with Python. 

 

Requirements:

This training can be standalone with prior Python experience or as a follow on from Introduction to Python and Python for Data Analysis on 22nd and 23rd April 2024

 

Find out more here

Day 3: Web scraping with Python

  • Introduction to Google Colab (students need a functioning gmail/google account they can log into)
  • Pandas dataframes and uploading external data to Colab
  • How to scrape a web page and extract text with Beautiful Soup 
  • How to analyse and visualise text content using the Seaborn library

Day 4: Introduction to Text Data with Python

  • Text preprocessing
  • Bag of words modelling and count vectorizer
  • Lexicon based sentiment analysis using spacy
  • Comparative visualisation

The workshop will include refreshments at 09.30 for 10am start, with lunch provided, concluding at 16.00. The duration of sessions may vary day to day with the overall format being a mix of lectures, demonstrations and practical exercises.